- Pros and Cons of R graphics, lattice, and GGplot.
- Bar plots.
- Box plots.
- Histograms/density plots.
- Themes.
- Combine multiple plots onto one page (ggarrange).
- Plotting data using Google maps (ggmap).
September 6 2019
Base R and lattice are two different approaches to data visualization in R:
What aesthetic would you set if you wanted to change the color of a bar plot?
You may use Google to find the answer
The correct choice is D.)
color only changes the border of the bars.
Calculate means manually before plotting.
# Using dyplr to obtian means by admin type and admin source DIABETES_MED_MEANS <- DIABETES_CLASS %>% filter(!(admin_source=="Missing" | admin_source=="Other")) %>% group_by(admin_type,admin_source) %>% summarise(mean_med = mean(num_medications))
Start with the basic plot.
https://ggplot2.tidyverse.org/reference/geom_bar.html
# Using dyplr to obtian means by admin type and admin source
DIABETES_MED_MEANS <- DIABETES_CLASS %>%
filter(!(admin_source=="Missing" | admin_source=="Other")) %>%
group_by(admin_type,admin_source) %>%
summarise(mean_med = mean(num_medications))
#ploting
barplot_exp <- ggplot(DIABETES_MED_MEANS,aes(x=admin_type,y=mean_med,fill=admin_source)) +
ggtitle("Type of Admission Versus Mean Number of Medications Taken
Grouped by Admission Source") +
xlab("Admission Type") + ylab("Mean Number of Medications") +
geom_bar(stat="identity", position = position_dodge())
Use scale_fill_manual() to manually change the fill properties
# Using dyplr to obtian means by admin type and admin source
DIABETES_MED_MEANS <- DIABETES_CLASS %>%
filter(!(admin_source=="Missing" | admin_source=="Other")) %>%
group_by(admin_type,admin_source) %>%
summarise(mean_med = mean(num_medications))
#ploting
barplot_exp <- ggplot(DIABETES_MED_MEANS,aes(x=admin_type,y=mean_med,fill=admin_source)) +
ggtitle("Type of Admission Versus Mean Number of Medications Taken
Grouped by Admission Source") +
xlab("Admission Type") + ylab("Mean Number of Medications") +
geom_bar(stat="identity", position = position_dodge()) +
scale_fill_manual(values=c("firebrick3","seagreen4","purple3","steelblue3"),
name="Admission Source")
Add the means for each bar via geom_text().
https://ggplot2.tidyverse.org/reference/geom_text.html
# Using dyplr to obtian means by admin type and admin source
DIABETES_MED_MEANS <- DIABETES_CLASS %>%
filter(!(admin_source=="Missing" | admin_source=="Other")) %>%
group_by(admin_type,admin_source) %>%
summarise(mean_med = mean(num_medications))
#ploting
barplot_exp <- ggplot(DIABETES_MED_MEANS,aes(x=admin_type,y=mean_med,fill=admin_source)) +
ggtitle("Type of Admission Versus Mean Number of Medications Taken
Grouped by Admission Source") +
xlab("Admission Type") + ylab("Mean Number of Medications") +
geom_bar(stat="identity", position = position_dodge()) +
scale_fill_manual(values=c("firebrick3","seagreen4","purple3","steelblue3"),
name="Admission Source") +
geom_text(aes(label=round(mean_med,1)),
vjust=1, position = position_dodge(.9),size=3, color="white")
First we will filter out missing and other races.
After that the usual parameters are set
https://ggplot2.tidyverse.org/reference/geom_boxplot.html
#Filter
DIABETES_FILTER <- DIABETES_CLASS %>% filter( !(race=="Other" | race=="Missing") )
#ploting
boxplot_exp <- ggplot(DIABETES_FILTER,aes(x=race,y=time_in_hospital,fill=sex)) +
ggtitle("Box Plots of Race Versus Time Spent in Hospital Grouped by Sex") +
xlab("Race") + ylab("Time Spent in Hospital (Hours)") +
geom_boxplot(outlier.shape=NA) +
scale_fill_manual(values = c("orchid3","deepskyblue3"),
name="Sex")
Plot the histogram via geom_histogram()
https://ggplot2.tidyverse.org/reference/geom_histogram.html
density_exp <- ggplot(DIABETES_CLASS,aes(x=weight,fill=sex,color=sex)) +
ggtitle("Distribution of Weight by Sex") +
xlab("Weight (lbs.)") + ylab("Density") +
geom_histogram(mapping = aes(y=stat(density)),
binwidth=5,
position="identity",
alpha=.1)
Set the aesthetics manually.
density_exp <- ggplot(DIABETES_CLASS,aes(x=weight,fill=sex,color=sex)) +
ggtitle("Distribution of Weight by Sex") +
xlab("Weight (lbs.)") + ylab("Density") +
geom_histogram(mapping = aes(y=stat(density)),
binwidth=5,
position="identity",
alpha=.1) +
scale_fill_manual(values = c("deeppink1","deepskyblue1")) +
scale_color_manual(values = c("deeppink4","deepskyblue4"))
Plot the density over each histogram via geom_density()
https://ggplot2.tidyverse.org/reference/geom_density.html
density_exp <- ggplot(DIABETES_CLASS,aes(x=weight,fill=sex,color=sex)) +
ggtitle("Distribution of Weight by Sex") +
xlab("Weight (lbs.)") + ylab("Density") +
geom_histogram(mapping = aes(y=stat(density)),
binwidth=5,
position="identity",
alpha=.1) +
scale_fill_manual(values = c("deeppink1","deepskyblue1")) +
scale_color_manual(values = c("deeppink4","deepskyblue4")) +
geom_density(alpha=.1)
Recall from yesterday some of the pre-defined themes in GGplot.
Today we will take a look at the theme() function and how it can be used to modify individual parts of a theme.
Like all other functions, theme() has many different parameters we can set.
For example:
For a complete list:
The tricky part is that some of these parameters requires you to define a element_ object.
Links:
Lets use the third example to test out theme().
Say I would like to make the following 5 changes to the theme of the plot:
Then the call to theme would look like this:
theme(legend.background = element_rect(color="blue"),
plot.title = element_text(family = "serif", face = "bold"),
axis.ticks = element_line(color="red"),
axis.ticks.length = unit(.25, "cm"),
legend.margin = margin(t=.5,b=.5,r=1,l=1,"cm"))
ggplot(DIABETES_CLASS,aes(x=weight,fill=sex,color=sex)) +
ggtitle("Distribution of Weight by Sex") +
xlab("Weight (lbs.)") + ylab("Density") +
geom_histogram(mapping = aes(y=stat(density)),
binwidth=5,
position="identity",
alpha=.1) +
scale_fill_manual(values = c("deeppink1","deepskyblue1")) +
scale_color_manual(values = c("deeppink4","deepskyblue4")) +
geom_density(alpha=.1) +
#Theme modifications
theme(legend.background = element_rect(color="blue"),
plot.title = element_text(family = "serif", face = "bold"),
axis.ticks = element_line(color="red"),
axis.ticks.length = unit(x=.25, unit="cm"),
legend.margin = margin(t=.5,b=.5,r=1,l=1,unit="cm"))
What parameter would you set in theme() if you wanted to place the legend at the top of the plot. How would you set this parameter?
Hint: this link lists all the settable parameters within the theme() function as well as the requirements for setting them.
The correct call to theme() would look like this:
theme(legend.position = "top")
#Get mean weight for each sex.
DIABETES_WEIGHT_MEANS <- DIABETES_CLASS %>% group_by(sex) %>%
summarise(mean_weight = mean(weight,na.rm=TRUE))
#ploting
ggplot(DIABETES_CLASS,aes(x=weight,fill=sex,color=sex)) +
ggtitle(" Distribution of Weight by Sex") +
xlab("Weight (lbs.)") + ylab("Density") +
geom_histogram(mapping = aes(y=stat(density)),
binwidth=5,
position="identity",
alpha=.1) +
scale_fill_manual(values = c("deeppink1","deepskyblue1")) +
scale_color_manual(values = c("deeppink4","deepskyblue4")) +
geom_density(alpha=.1) +
geom_vline(data=DIABETES_WEIGHT_MEANS,
aes(xintercept=mean_weight,color=sex),
linetype="dashed") +
#Theme modifications
theme(legend.position = "top")
ggarrange() is a function that allows us to arrange multiple ggplots on the same page.
You would usually do this when you have multiple plots that show different parts of a relationship.
Today we will use the plots that we made.
The typical call to ggarrange looks like this:
ggarrange(…, labels, ncol, nrow)
https://www.rdocumentation.org/packages/ggpubr/versions/0.2.2/topics/ggarrange
For example, if we wanted to arrange the bar plot and box plot from the examples such that the plots are placed above one another,
then the call to ggarrange would look like this:
ggarrange(barplot_exp,boxplot_exp, ncol = 1, nrow = 2)
Arrange two of the three plots you created today.
ggmap is a package that allows you to obtain map data from Google maps and plot it using the ggplot framework.
Unfortunately, due to recent changes to the Google API, using ggmap means you must have a API key.
Moreover, users are required to enter valid credit card information to resister for a API key.
This means that we wont be able to do interactive examples in ggmap.
Nonetheless, we can still look at some examples.
This example shows densities of lightning strikes in Houston.
This plot relies on latitude and longitude data collected from the World Wide Lightning Location Network.
head(lightning_raw)
lat lon
1 29.775 -94.649
2 30.240 -94.270
3 29.803 -94.418
4 29.886 -94.342
5 29.892 -94.085
6 29.898 -94.071
Obtaining the map of Huston requires knowing the latitude and longitude coordinates.
library(ggmap) houston <- c(lon = -95.36, lat = 29.76) houston_map <- get_map(location = houston, zoom = 8, color = "bw")
Now we can use ggmap() in replacement of ggplot()
Otherwise we build the ggplot as we are use to.
ggmap(houston_map, maprange=FALSE) +
stat_density_2d(data = lightning_raw,
aes(x = lon, y = lat, fill = ..level.., alpha = ..level..),
color="blue",size = 0.01, bins = 16, geom = 'polygon') +
scale_fill_gradient(low = "green", high = "red") +
scale_alpha(range = c(0.00, 0.25), guide = FALSE) +
theme(legend.position = "none",
axis.title = element_blank(),
text = element_text(size = 12))
More examples
R for Data Science is an excellent resource for learning more about ggplot.
We recommend you try the following exercises as homework: